657 research outputs found

    Deep Span Representations for Named Entity Recognition

    Full text link
    Span-based models are one of the most straightforward methods for named entity recognition (NER). Existing span-based NER systems shallowly aggregate the token representations to span representations. However, this typically results in significant ineffectiveness for long-span entities, a coupling between the representations of overlapping spans, and ultimately a performance degradation. In this study, we propose DSpERT (Deep Span Encoder Representations from Transformers), which comprises a standard Transformer and a span Transformer. The latter uses low-layered span representations as queries, and aggregates the token representations as keys and values, layer by layer from bottom to top. Thus, DSpERT produces span representations of deep semantics. With weight initialization from pretrained language models, DSpERT achieves performance higher than or competitive with recent state-of-the-art systems on eight NER benchmarks. Experimental results verify the importance of the depth for span representations, and show that DSpERT performs particularly well on long-span entities and nested structures. Further, the deep span representations are well structured and easily separable in the feature space

    Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional Data

    Full text link
    There are threefold challenges in emotion recognition. First, it is difficult to recognize human's emotional states only considering a single modality. Second, it is expensive to manually annotate the emotional data. Third, emotional data often suffers from missing modalities due to unforeseeable sensor malfunction or configuration issues. In this paper, we address all these problems under a novel multi-view deep generative framework. Specifically, we propose to model the statistical relationships of multi-modality emotional data using multiple modality-specific generative networks with a shared latent space. By imposing a Gaussian mixture assumption on the posterior approximation of the shared latent variables, our framework can learn the joint deep representation from multiple modalities and evaluate the importance of each modality simultaneously. To solve the labeled-data-scarcity problem, we extend our multi-view model to semi-supervised learning scenario by casting the semi-supervised classification problem as a specialized missing data imputation task. To address the missing-modality problem, we further extend our semi-supervised multi-view model to deal with incomplete data, where a missing view is treated as a latent variable and integrated out during inference. This way, the proposed overall framework can utilize all available (both labeled and unlabeled, as well as both complete and incomplete) data to improve its generalization ability. The experiments conducted on two real multi-modal emotion datasets demonstrated the superiority of our framework.Comment: arXiv admin note: text overlap with arXiv:1704.07548, 2018 ACM Multimedia Conference (MM'18

    U-shaped fusion convolutional transformer based workflow for fast optical coherence tomography angiography generation in lips

    Get PDF
    Oral disorders, including oral cancer, pose substantial diagnostic challenges due to late-stage diagnosis, invasive biopsy procedures, and the limitations of existing non-invasive imaging techniques. Optical coherence tomography angiography (OCTA) shows potential in delivering non-invasive, real-time, high-resolution vasculature images. However, the quality of OCTA images are often compromised due to motion artifacts and noise, necessitating more robust and reliable image reconstruction approaches. To address these issues, we propose a novel model, a U-shaped fusion convolutional transformer (UFCT), for the reconstruction of high-quality, low-noise OCTA images from two-repeated OCT scans. UFCT integrates the strengths of convolutional neural networks (CNNs) and transformers, proficiently capturing both local and global image features. According to the qualitative and quantitative analysis in normal and pathological conditions, the performance of the proposed pipeline outperforms that of the traditional OCTA generation methods when only two repeated B-scans are performed. We further provide a comparative study with various CNN and transformer models and conduct ablation studies to validate the effectiveness of our proposed strategies. Based on the results, the UFCT model holds the potential to significantly enhance clinical workflow in oral medicine by facilitating early detection, reducing the need for invasive procedures, and improving overall patient outcomes.</p
    corecore